PCAs based on genome composition for 5174 coronavirus spike protein sequences. PCAs are colour-coded and ellipses drawn based on different outcome variables, though underlying PCA for each bias type is the same. Mouseover gives outcome variable and virus name.
Fairly clear clustering and separation of genera!
Very tight clusters, especially of SARS-CoV and SARS-Cov-2. MERS, SARS, SARS-CoV-2 actually as distinct from each other as they are from other human CoVs. And many close animal viruses…
Separation seems driven almost entirely by stop codon use, alphas preferring TGA, betas and gammas preferring TAA, deltas somewhere in between.
Epidemic coronaviruses are strongly separated from each other, but not too separated from other human viruses again. Virtually all the human viruses prefer TAA, except HCoV-HKU1 doesn’t seem fussy.
Clusters, but not as clear a separation here.
Surprisingly, the epidemic human coronaviruses are well separated from the endemic human coronaviruses here!